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Abstract 

In the last decade, sequential Monte-Carlo metho ds (SMC) emerged a s 
a key tool i n computationa l statistics (see for instance |Doucet et all l|2f)f)l|) . 
iLiul i)2001[K Ifvunschl l)200l[) ). These algorithms approximate a sequence of 
distributions by a sequence of weighted empirical measures associated to a 
weighted population of particles. These particles and weights are generated 
recursively according to elementary transformations: mutation and selec- 
tion. Examples of applications include the sequential Monte-Carlo tech- 
niques to solve optimal non-linear filtering problems in state-space models, 
molecular simulation, genetic optimization, etc. 

De s pite many theo r etical advances (s e e for instance [ Gilks and Berzuinil 
l|200lh . lKuns"chl l|2003l) . iDel Morall l|2004l . IChopinl lj20()4) ). the asymptotic 
property of these approximations remains of course a question of central 
interest. In this paper, we analyze sequential Monte Carlo methods from 
an asymptotic perspective, that is, we establish law of large numbers and 
invariancc principle as the number of particles gets large. We introduce the 
concepts of weighted sample consistency and asymptotic normality, and de- 
rive conditions under which the mutation and the selection procedure used 
in the sequential Monte-Carlo build-up preserve these properties. To illus- 
trate our findings, we analyze SMC algorithms to approximate the filtering 
distribution in state-space models. We show how our techniques allow to 
relax restrictive technical conditions used in previously reported works and 
provide grounds to analyze more sophisticated sequential sampling strate- 
gies. 



Short title: Limit theorems for SMC. 



1 Introduction 

Sequential Monte Carlo (SMC) refer to a class of methods designed to approxi- 
mate a sequence of probability distributions over a sequence of probability space 
by a set of points, termed particles that each have an assigned non-negative 
weight and are updated recursively in time. SMC methods can be seen as 
a combination of the s equen tial importance sampling introduced method in 
Handschin and Mavnd <ll969h and the sampling importance resampling algo- 



rianciscmn ana iviavnc ( iyt>y ) ana tne sampling importance resampling aigo- 
rithm proposed in lRubinT i 1987 ): it uses a combination of mutation and selection 
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steps. In the mutation step, the particles are propagated forward in time using 
proposal kernels and their importance weights are updated taking into account 
the targeted distribution. In the selection (or resampling) step, particles mul- 
tiply or die depending on their fitness measured by their importance weights. 
Many algorithms have been proposed since, which differ in the way the particles 
and the importance weights evolve and adapt. 

SMC methods have a long history in molecular simulations, where they have 
been found to be a one of the most powerful m eans for the simula t ion an d 
optimization of chain polymers (see for instance Landau and Binder l|2000l )). 
SMC methods have more recently emerged as a key tool to solve on-line pre- 
diction / filtering / smoothing problems in a dynamic system. Simple yet flex- 
ible SMC methods have been shown to overcome the numerical difficulties and 
pitfalls typically encountered with traditional methods based on approximate 
non- linear filtering (su ch as the extended K alman filte r or gaussian-sum fil - 
ters ); see for instance Liu and Chenl (|l998h . Ehl (l200lh . iDoucet et all (fcnOlh 



mullms.ir et al.l <2fH4j) and the references therein. More recently, SMC meth- 
ods have been shown to be a promising alternative to Markov Chain Monte Carlo 
techniques f or sampling complex distr ibuti ons over large dimen sional spaces; see 
for instance IGilks and Berzninil (j20mh and lCappe et al.l (|2005l ) . 

In this paper, we focus on the asymptotic behavior of the weighted particle 
approximation as the number of particles tend to infinity. Because the particles 
interact during the selection steps, they are not independent which make the 
analysis of particle approximation a challenging area of research. This topic has 
attracted in recent years a great deal of efforts in recent years making it a daunt- 
ing task to give credit every cont ribution. The first rigorous convergence result 
was obtained in IPel Morall (jl996h . who established the almost-sure convergence 
of an elementary SMC algorithm (the so-c alled bootstrap filter). A central limit 
theorem f or this algorithm was derive d in Del Moral and Guionnet (|l999h and 
refined in IPel Moral and Miclcl (I2000I ). The proof of the CLT wa s later sim- 
plified and ex tended to more general SMC algorithms by iKiinsc 3 HHil) and 
Choninl ^20041 ) . Bounds on the fluctuations of the parti c le approximations for 



differe nt n orms were reported i n Crisan and Lvoiisl dl997h , IPel Moral and Miclcl 
( 20nnl ) and lCrisan and Doucetl (|2002h . IPel Morall d2004l ~ provides an up-to-date 

and thorough coverage of re cent theoretic al developments in thi s area. 

W ith fe w except i ons (s ee lChopinl ( 200d ) and to a lesser extent, Crisan and Doucet 
(|2002l ) and iKiinschl $2QM)), these results apply under simplifying assumptions 
on the way particles are mutated and selected which restrict the scope of ap- 
plicability of the results only to the most elementary SMC implementations. In 
particular, all these results assume that selection is performed at each iteration 
which implies that the weights are not propagated. This is clearly an annoying 
limitations since it has been noticed in practice that resampling the particle 
system at each time step is m ost often not a clever choice. As discussed for ex- 
ample in (ILiu and Che rl ll99d . section 2), when the weights are nearly constants, 
resampling only reduces the number of distinct particles and introduces extra 
Monte Carlo variations. Resampling should only be applied when the weights 
are very skewed: carrying many particles with very small importance weights is 
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indeed a waste. Resampling provide chances to good particles to replicate and 
hence rejuvenate the sampler to produce better future particles. 

The main purpose of this paper is to derive an asymptotic theory of weighted 
system of particles. To our best kno wledge, limit th e orem s for such weighted ap- 
proximations were only considered in Liu and ChenI ( 19981 ) , who mostly sketched 
consistency proofs. In this paper, we establish both law of large numbers and 
central limit theorems, under assumptions that are presumably closed from being 
minimal. These results apply not only to the many different implementations of 
the SMC algorithms, in cluding rather soph i sticat ed schemes such as the resam- 
ple and move algorithm Berzuini and Gilksl (|2f)0 ll ) or the auxiliary particle filter 
bv lPitt and Shephardl (|l999h . Thev cover resampling schedules (when to resam- 
ple) that can be either deterministic or dynamic, i.e. based on the distribution 
of the importance weights at the current iteration. They also cover sampling 
scheme that c an be either simple ra ndom sampling (with wei ghts) but also resid- 
ual s ampling ( Liu and ChenI 199§ft or auxiliary sampling ( Pitt and Shephardl 
Il999l ). We do not impose a specific structure on the sequence of the target prob- 
ability measure; therefore our results apply not only to sequential filtering or 
smoothing of state-space contexts, but also to recent algorithms developed for 
population Monte Carlo or for molecular simulation. 

The paper is organized as follows. In section [2J we introduce the definitions 
of weighted sample consistency and asymptotic normality; we then discuss the 
meaning of these definitions in a simple situation. In section |31 we present and 
discuss the conditions upon which consistency or / and asymptotic normality of a 
weighted sample is preserved during the mutation and selection steps. In section 
01 we apply the result to the estimation of the joint smoothing distribution for 
a state-space model. In particular, we establish a central limit theorem for a 
SMC method involving a dynamic resampling scheme. These results are based 
on genera l resul ts on triangular array of martingale increments (in the sense of 
( Shirvaevl . fl99^ . Section II. 7)) which are established in section lAl 



2 Notations and Definitions 

All the random variables are defined on a common probability space (f2,.F, P). 
A state space X is said to be general if it is equipped with a countably generated 
u-field B(X). For a general state space X, we denote by 'P(X) the set of proba- 
bility measures on (X,B(X)) and B(X) (resp. B+(X)) the set of all B(X)/B(R)- 
measurable (resp. non-negative) functions from X to R equipped with the Borel 
ff-fidd B(R). For any fi £ V(X) and / G B(X) satisfying f x fi(dx)\f(x)\ < 00, 
/j,(f) denotes f x f(x)fi(dx). Let X and Y be two general state spaces. A kernel 
V from (X,B(X)) to (Y,B(Y)) is a map from X x £>(Y) into [0, 1] such that for 
each A £ B(Y), x 1— > V(x,A) is a nonnegative bounded measurable function on 
X and for each x S X, A *— > V(x,A) is a measure on B(Y). We say that V is 
finite if V(x, Y) < 00 for any x £ X; it is Markovian if V(x, X) = 1 for any x £ X. 
For any function / G B(X x Y) such that J Y ^( x > dy)\f(x, y)\ < 00 we denote by 



3 



V(-,f) the function 



V(;f):x» V(x, f) d ^ V(x, dy)f(x, y) . (1) 

The function V(-,f) belongs to B(X). We sometimes use the abridged notation 
Vf instead of V(-,f). For v a measure on (X,23(X)), we denote by vV the 
measure on (Y, B(Y)) defined for any A € £>(Y) by vV(A) = f x v(dx)V(x, A). 

Let S be a general state space, ^ be a probability measure on (E, 23(E)), 
{Mat}tv>o be a sequence of integers, and C be a subset of L (S, //). We approx- 
imate the probability measure /U by points £j\r,i G E, i = 1, . . . ,Mjy associated 
to non-negative weights a;jv,j > 0. 

Definition 1. ^4 weighted sample {(£,N,i^N,i)}i<i<M N on E is said to be con- 
sistent /or i/ie probability measure fi and the set C C L 1 (H,^) if for any f 6 C, 

as iV — ► oo, 

fi^ 1 2>tf >i /(6v,i)^M/), (2) 
i=i 

— 1 P \ — ^ 

$77t max w«; — > where Jlw = ) i • (3) 

JV l<i<Mjv ' 

i=l 

This definition of weighted sample consistency is s i milar to the notion of 
properly weighted sample introduced in Liu and Chenl (|l998l '). The difference 



stems from the condition (j2J) which implies that the contribution of each in- 
dividual term in the sum vanishes in the limit as N — > oo, a condition often 
referred as smallness in the literature. 

Example 1 (Importance Sampling). To illustrate the meaning of these con- 
ditions, consider the importance sampling estimator. Let [i (the target distri- 
bution) and v (the proposal distribution) be a known (perhaps up to a normal- 
izing constant) probability distribution on (E, 23(E)). Suppose that /i is abso- 
lutely continuous with respect to v and denote by W the importance function, 
W = (3^ for some (3 > 0. Then, the weighted sample {(£iv,i> ^(^N,i))} 1<i<N , 
where {(,N,i}i<i<N are i.i.d. ^-distributed is consistent for {/i, L 1 (E,/i)}. Eq. (J2J) 
follows from the law of large numbers. For any e, C > 0, 



AT _1 E 



max cl>7v- 

Ki<7V 



N 

< e + N -^E [unAw^N}} <e + " (W1{W > C}) 



i=l 



for sufficiently large N. Because v{W) = (3, v {W\\W > C}) converges to zero 

p 

as C — > oo, and © follows because — ► (3. 

Of course, importance sampling is only one technique of the many techniques 
that can be used to obtain a consistent weighted samples; other approaches will 
be considered below. For more complex sampling schemes C can be a proper 
subset of L 1 (E,/x). In order to obtain sensible results we restrict our attention 
to classes of sets which are sufficiently rich. 
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Definition 2 (Proper Set). A set C of real-valued measurable functions on 3 
is said to be proper if the following conditions are satisfied. 

(i) C is a linear space: for any f and g in C and reals a and (3, af + fig G C , 

(ii) If g G C and f is measurable with \f \ < \g\, then \f \ G C , 
(Hi) For all c, the constant function f = c belongs to C. 

For any function /, define the positive and negative parts of it by 

/ + = /V0 and r = (-/)V0, 

and note that / + and /~ are both dominated by |/|. Thus, if |/| G C then / + 
and / _ both belong to C and so does f = f + — f~ ■ It is easily seen that for any 
p > and any measure fi on (3,£>(3)), the set L p (3,/i) is proper. 

A classical way to strengthen consistency it is to consi der distributional con- 
vergen ce of the normalized dif ference. Recall that (see IHd ous and EaglesoiJ 
Ezl), (jHall and Hev del.ll98ll. chapter 3)) if a sequence of rv's converges in 

distribution to X, (written Xn — > X), the convergence is said to be stable 

(written Xn — ► X (stably)), if for any set B G T and for a countable dense set 
of points igR, 

P(^C/v < x,B) exists. 

In other word, for all events B G T such that P(-B) > 0, the distribution of Xn 
conditional on B converges in distribution to some (proper) distribution which 

may depend on B. The convergence is mixing (written Xn — > X (mixing)) if 

in a stable limit result X^ — > X* (stably) the limit random variable X* can 
be chosen to be independent of T . Mixing is in particular useful when studying 
random sum limit theorems. 

Let \i be a probability measure on (H,i3(H)), 7 be a finite measure on 
(3,5(3)), A C L 1 (3,/x) and W C L x (3,7) be sets of real-valued measurable 
functions on 3, a a real non negative function on A, and {a^} be a non- 
decreasing real sequence diverging to infinity. 

Definition 3. A weighted sample {(£,N,i^N,i)}i<i<M N on 3 is said to be asymp- 
totically normal for (/i, A, W, a, 7, {on}) if 

M N 

aN^^^N^fi^i) ~ M/)} — ► N {°> °" 2 (/)} (mixing) for any f G A , 
i=i 

(4) 

M N 

a 2 N n~ 2 ^2 <4, 4 /(6v,0 lU) for anyf£\N 



i=l 



(5) 



—1 p 

ajyO^r max utv,?. — > (6) 
l<i<Mjv ' 
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We stress that the rate {a^} can be different from y/M$[ because of the 
dependence among the weighted sample {(^N,i^N,i)}l<i<M N introduced by the 
different transformations undergone by the weighted sample. 

Example 2 (Importance Sampling (continued)). The rationale for the 
conditions © and © is best understood by considering again the importance 
sampling example. Assume that v(W 2 ) < oo, where as above W = (3 Define 
the subsets A C 1(3) and W C 1(3), 



A d = {/ G 1(3) : u{W 2 f 2 ) < oo} and W = {/ G 1(3) : v(W 2 \f\) < oo} 



dcf 



For / G A, define S N (f) = »^ Ej=i^[/(^) " Kf)\- Using U N /N 

(3 and dAldons and Eaglesonl . Il97d . Theorem 2), N l / 2 S N {f) S (mixing), 

where S is Gaussian random variable with zero-mean and variance 



,s dcf I ( d/J:\ . / ,«]2 



(7) 



The law of large numbers for i.i.d. sequence implies that for any / G W, 



N 



7VJ2- 2 J>^/(6v,;) = NnrfY^W 2 ^)!^) 7 (/) = v 



i=i 



i=l 



dv 



showing ((SJ. For any e, C > 0, 



iV _1 E 



max Wat, 
l<j<7V 



N 

< e 2 +iV - 1 ^E [u 2 N>i l {w% ,> e2N} \ < e 2 +u (W 2 1{W 2 > C}) 



i=i 



for sufficiently large iV. Because A is a proper set, v {W 2 \{W > C}) go to zero 
as C — ► oo. Using again Q^/N — > j3, the previous display implies that 



N 1 ' 2 ^ 1 max ujj\[ { 



(8) 



showing Q: any individual term in the sum is small compared to the square 
root of the variance, a condition which is known to be necessary for a central 
limit theorem to hold for triangular arrays of independent random variables, see 
Petrov! (Il995l ). Stabil ty in the limit can be used to deal with situations whe re 
the number of terms in the sum are random (see ICsorgo and Fischler 
This is considered in a companion paper. 



3 Main results 

To analyze the sequential Monte Carlo methods discussed in the introduction, 
we now need to study how the mutation step and the resampling step affects 
the consistent or / and asymptotically normal weighted sample. 



G 



3.1 Mutation 



To study SISR algorithms, we need first to show that when moving the particles 
using a Markov kernel and then assigning them appropriately defined importance 
weights, we transform a weighted sample consistent (or asymptotically normal) 
for a distribution v on a general state space (3,6(3)) into a weighted sample 
consistent (or asymptotically normal) for a distribution fi on (3,6(3)). Let 
L be a kernel from (3,6(3)) to (3,6(3)) such that z^L(3) > and for any 
/GB(3), 

u(f) ^(/) IIu(dQL(tdi)f(i) 
uL(3) Jf v{dt)L(t,B) 

There exist of course many such kernels: one may set for example L(£, ^4) = fJ,(A) 
for any (6 a, but, as we will see below, this is not usually the most appropriate 
choice. 

We wish to transform a weighted sample {(S,N,i,^N,i)}i<i<M N targeting the 
distribution v on (3,6(3)) into a weighted sample {(S,N,ii ^iV,i)}i<i<MM target- 
ing [i on (3, 6(3)). The number of particles Mn is set to be Mn = oinMn where 
oat is the number of offsprings of each particle. The us e of mu l tiple o ffspring for 
a particle has been suggested in the context of SIR bv lRubinl dl987h : when the 



mutation step is followed by a resampling step, an increase in the number of dis- 
tinct particles in the mutation step will typically increase the number of distinct 
particles after the resampling step. In the sequential context, this operation 
is regarded as a practical mean for contending particle impoverishment. These 
offsprings are proposed using Markov kernels R, k = 1, . . . ,aj\r, from (3,6(3)) 
to (3,6(3)). Implicitly, we suppose that sampling from the proposal kernels R 
is doable. 

Most importantly, we assume that for any ( 6 3, the probability measure 
L(£, •) on (3, 6(3)) is absolutely continuous with respect to R, which we denote 
L(l, •) <€. •) and define 

w ^=W> ii] (io) 

The new weighted sample {(S,N,i, &N,i)}\<i<M * s constructed as follows. We 
draw new particle positions {^N,j}i<j<M N conditionally independently given 
Fn,q = o ({(£,N,i,UN,i)}i<i<M N ) with distribution given for i = 1, . . . ,M N , k = 
1, . . . , on and A £ 6(3) by 

=R(g N ,i,A) (11) 



P y£,N,a N (i-l)+k £ A 

and associate to each new particle positions the importance weight: 

^N,a N (i-i)+k = v N>i W(£ N ,i, ^N,a N (i-i)+k) , where i = 1, . . . , M N , k = 1, . . . , a N 

(12) 

The mutation step is unbiased in the sense that, for any / 6 6(3) and i = 
l,...,M N , 

a N i 

E UON,jf(£,N,j) FN,j-l = OlNUN,iL{Z,N,i, f) , (13) 

j=a N (i-l)+l 



7 



where for j = 1, . . . , Mjv, ^N,j — -^jv.o Vct^a^/Ii^kj). The following theorems 
state conditions under which the mutation step described above preserves the 
weighted sample consistency. 

Theorem 1. Let u be a probability measure on (E,£>(H)), \x be a probability 
measure on (S,B(E)). Let L be a finite kernel from (5,B(&)) to (H,Z3(E)) 
satisfying uL(S) > any A £ B(S), ^{A) = uL(A)/i/L(S). Assume that 

(i) the weighted sample {((,N,i^N,i)}i<i<M N is consistent for (v,C), where 
C C L 1 (H, v) is a proper set , 

(ii) the function L(-,H) belongs to C , 
(in) for any (£a, L(£, •) < R(£, •) . 

T/ien, C is a proper set and the weighted sample {(^,N,i,^N,i)}i <i< M N defined by 
(fTTj) and (fT2"|) is consistent for (fi,C), where 

C^{felHS^),L(.,\f\)€C}, (14) 
We now turn to prove the asymptotic normality. 

Theorem 2. Suppose that the assumptions of Theorem^\hold. Let 7 be a finite 
measure on (H,£>(H)), A C L 1 (H,z/), and\N C L 1 (H,7) 6e proper sets, a be a 
non negative function on A. Assume in addition that 

(i) the weighted sample {(£,N,i,^N,i)}i<i<M N is asymptotically normal for (v, 
A, W, a, 7, Wll 2 }) , 

(ii) the function R(-,W 2 ) belongs to W . 

Then, if — ► a, the weighted sample {(£jv,ii ^iV,i)}i<j<M * s asymptotically 

~ ~ 1/2 — — 

normal for A, W, a, 7, {-Miy }) wii/i 

A = f |/ G L^H,^) :!-(•, I/I) e A and /?(•, W 2 f 2 ) G W j , 

{/eL 1 ^,^ :^,lf 2 |/l)GW} , 
<r 2 (/) d ^ f a 2 {Z[/ - M/)]} + a-^^D^/ - R(; Wf)} 2 } , for all f € A , 
7(/) = ""S^ 2 /) , M aZZ / £ W , 

where L ^ [z/L(H)] _1 L and = [i/L(H)] _1 VF. Moreover, A and W are proper 
sets. 

Remark 1. The use of different proposal kernels Rk is sometimes advisable, 
to add variety in the type of moves that are applied to the particle system. 
The idea of using multiple type of mo v es is exploited in particular in adaptive 
importance sampling (see Douc et al. ( 2005^ for an illustration). The results 
above can be adapted directly when the number of offsprings = a does not 
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depend on N. To handle this case, define the average kernel R = a" 1 Ylk=i ^ k 
and suppose that for any £ G S, L(£, •) <C i?(£, •) (note that the individual 
proposal kernels need not be absolutely continuous with respect to L). For 
i = 1, . . . , M/v, = 1, . . . , on and A G £>(H), we draw the new particle positions 
according to 

' Fn,o) =Rk(tN,i,A) . (15) 



P ( ^N,a N (i-l)+k e ^ 
We associate to each particle the importance weight, 

&N,a N (i-l)+k = VN,iW(£ N ,i,£N,a N i+k) , where i = 1, . . . , M N , = 1, . . . , a N . 

(16) 

where W is defined as in (jlUj) . With these definitions and notations, the results 
of Theorems ^ and |2] continue to hold. 

To illustrate the meaning of these statements, consider again the elementary 
example of importance sampling. It turns out that reweighting the particles 
without moving them is a particular case of mutation. Thus, Theorem ^ and [21 
may apply in this context. 

Example 3. Let v and \x be two probability measures on (H,i3(H)). Assume 
that fi < v. Set L(£,-) = /3^(£)<5 5 , for some (3 > 0, where 5^ is the Dirac 
measure. With this definition of L, we obviously have /z(-) = z/L(-)/VL(H). If 
{£,N,i}i<i<M N are independent ^-distributed random variables, then the weighted 
sample {(£/v,i, l)}i<i<Mjv is consistent for (v, L 1 (H,zv)). Put for £ G 3 and A G 
B(S), R(£,A) = 5^(A). Then, Theorem[T]shows that {(fjv.i, W(%N,i, €N,i))}l<i<M N 
where W(£,i) = /3^;(£) is consistent for {/j,, L 1 (H, /i)}. We now turn to check 
the asymptotic normality. From the CLT for i.i.d. sequence, the weighted sam- 
ple is {(£jv,i, l)}i<i<M N is asymptotically normal for (u, L 2 (X, u), L 1 (X,z/), a 2 , 

i>, M l J 2 }, where for / G A d = L 2 (X,i/), u(f) = u{[f - v{f)} 2 }. Suppose 
that v(W 2 ) < oo. Because R{-,W 2 ) = W 2 G W = L^X,!/), then Theo- 
rem|21shows that {(Cn,i, W(&v,ij £,N,i))}i<i<M N is asymptotically normal for (fi, 
A, C, a, 7, V^}, where A = {/ G L^S, fi), v{W 2 f 2 ) < oo}, VV = {/ G 
L 1 (S, n),v(W 2 \f\) < oo}, 7 (/) = v [W 2 f] for / G W, and a 2 (f) = Var„ [Wf] 
for / G A. These expressions are particularly simple because, in this case, 
Wf = R(-, Wf), which is of course false in general. 

3.2 Resampling 

Resampling is the second basic type of transformation used in sequential Monte- 
Carlo methods; resampling converts a weighted sample {(£,N,i, ^>N,i)}i<i<M N tar- 
geting a distribution v into an equally weighted sample {(£/v,i> l)}i<i<M tar- 
geting the same distribution v on (S, 0(E)). Note that the number of resampled 
particles, denoted Mat, might well differ from the initial number of particles M^. 
In this section, we will focus on importance sampling estimators that satisfy the 
following unbiasedness condition: 



for any / G B(S), E 



M N 
i=l 



Mn 

^ l N 1 ^2 UJ N,if{CN,i) , (17) 
i=l 
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where J^n = (j{{{^,N,i-,^N,i)}i<i<M N - The unbiasedness condition (fT7|) implies 
that resampling can only increase the variability of the importance sampling 
estimator and has thus, in one step, an adversary effect. In the sequential con- 
text however, resampling is nevertheless essential by removing particles with 
small importance weights and producing multiple copies of particles with large 
importance weights, that help in generating better future samples. There are 
many different unbiased resampling procedures described in the literature. The 
most elementary procedure is the so-called multinomial sampling in which we 
draw conditionally independently given integer-valued random variables 
{/iv,fc}i<fc<jif w with distribution 

P(/jv,fc = <|Jw) = n^W,i, i = l,...,M N , (18) 

and set 



£ N ,l Ni , for i = 1,...,M N . (19) 



To bring down the extra-variability incurred by multinomial sampling, other 
resampling strategies have been co n sidere d in the literature. A possible solu- 
tion, investigated in iLiu and Chenl HSl), consists in using a combination of 



deterministic plus residual sampling. This scheme consists in retaining at least 
[fi^Mjvw/v.iJ , i = 1, ... , M/v copies of the particles and then reallocating the 
remaining particles by applying the multinomial sampling procedure but on the 
residual importance weights, MnQJ^oon^ — [MjfQj^LOjfjl . Define Jjv,o — and 
for k > 1, Jjvfe = ^2i—i\_MN^Jr'(^N,i\- The residual resampling algorithm is 
defined as follows: 

(i) For i=l,..., Mjy, assign 

6vj = £,N,i , for all j = Jn,i-i + !,-••, Jn,i ■ (20) 

(ii) Draw, conditionally independently given = o-({((,N,i,^>N,i)}i<i<M N 
random variables {lN,k\i<k<M N -M N where 

d f Mjv 
Mn = Jn,m n = [MnQJ^un^I , 

i=i 

P (lN,k = * FN) = 7v 7^ = UN ,i ' * = !>•••, M N 

M N -M N 

(21) 

and set £ N ^M N +k = ^N,i N:k for all 1 < k < M N - M N . 

If the weighted sample {(CN,i,^N,i)}i<i<M N is consistent for (y, C), where C is 
a proper subset of B(X), it is a natural question to ask whether {(£j\r,i, l)}i<i<M 
is consistent for v and, if so, what an appropriately defined class of functions on 
S might be. It happens that a fairly general result can be obtained in this case. 

Theorem 3. Let v a probability measure on (E, 23(E)) and C C L 1 (E,z^) be a 

proper set of functions. Assume that the weighted sample {(£jv,ii ^N,i)}i<i<M N 
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is consistent for (is, C), where C C B(S) is a proper set. Then, the uniformly 
weighted sample {(£,N,i, ^)}i<i<M N obtained using any unbiased resampling scheme 
(i.e. satisfying (fTTj) ) is consistent for (v, C). 

It is also sensible in this discussion to strengthen the requirement of consis- 
tency into asymptotic normality, and again prove that the sampling operation 
transform an asymptotically normal weighted sample for v into an asymptoti- 
cally normal sample for v (for appropriately defined class of functions, normal- 
izing factors, etc.). We consider first the multinomial sampling algorithm. 

Theorem 4. Suppose that the assumptions of Theorem^ hold. Let 7 be a finite 
measure on (H,£>(H)), A C L 1 (H,i/) and W C L 1 (3,7) be proper sets, a be a 
non negative function on A and {a^} be a non-negative sequence. Define 

A={/£A,/ 2 £C} , (22) 

Assume in addition that 

(i) the weighted sample {(^N,i, &N,i)}i<i<M N is asymptotically normal for (v, 
A, W, a, 7, {a N }) 

(ii) liniTv^ooaAr = 00 and limTv^oo o? n /Mn = (3, where (5 G [0,oo]. 

Then A is a proper set and the following holds true for the resampled system 
{(6v,i, l)}\<i<M N defined as in (fT8]l and (fT9]l. 

(i) If (3 < 1, then {(^N,iA)}i<i<M is asymptotically normal for (v, A, C, a, 
7, {cln}) with 

d\f) = /3Var„(/) + a 2 [f) for any f G A , 
7 = (3u . 



(ii) If (3 > 1, then {(£jv,i, l)}i<i<M * s asymptotically normal for (v, A, C, a, 
7, {M]j 2 }) with 

d\f) = Var„(/) + /rV(/) for any f G A , 
7 = v . 



Example 4. Let ~P(S) and suppose that fi <C v. As shown in the preced- 

ing example, if {£,N,i}i<i<M N is an i.i.d. sample from u, then the weighted sample 
{(&v,i, w (CN,i))} 1 < i < MN , where tx d/x/dv is consistent for L 1 (H, /x)} and 
asymptotically normal for (/i, A, W, a, 7, v^/v}; where A = {/ G L (3,/i), 
1/ [W 2 / 2 ] < 00}, W = {/ G B(S),u [W 2 \f\] < 00}, 7 (/) = ^ [W 2 /] for / G W, 
and <7 2 (/) = Var„ \ W.f] fo r f G A. The sampling importance resampling pro- 
cedure outlined in iRubinl (|l98?f ) consists in resampling the weighted sample 
{(£iv,i) M^(^A r ,i)}i<i<M ]V - It is recommended that the number Mat of resam- 
pled particles be much less than Mjv the n umber of elements in the im portance 
weighted sample {(^N,i,^N,i)}i<i<M N (hi ( Gelman and Rubinl 19921 . pp. 459) 
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it is suggested to sample Mjy = 10 out of Mjv = 1000 elements). This sugges- 
tion is supported by the results above. Theorem |3] shows that the uniformly 
weighted sample {{Cn,i, l)}i<i<M * s consistent for {//, L 1 (3,/i)}. Assume that 
(3 > 1, Theorem |1] shows that {(£jv,j, ^-)}i<i<M N * s asymptotically normal for (fi, 

A, L 1 (S,/z), or, n, t/m^}) where A = {/ G L^S, /z), + W 2 )f) < cx)} 

and cr is defined by ct 2 (/) = Var At (/) + Var^ [W/]. Suppose now that 
limAr^oo M^v /M^r = oo and thus (3 = oo. Then, the limiting variance is the 
basic Monte Carlo variance Var„(/): the resampled particles can be thought 
of as an i.i.d. sample from the target distribution fi, providing thus a simple 
theoretical justification for the SIR procedure. 

The analysis of the deterministic plus residual sampling is more involved. To 
carry out the analysis, it is required to specify more precisely the importance 
weight. 

Theorem 5. Let k be an integer, v be probability measure on (5,B(B)), 7 be 
a finite measure on (H, £>(E)), A C L 1 (H, v), C C L 1 (H, v) and W C L 1 (X, 7) be 
proper sets, a be a non negative function on A and & be a non negative function 
on S. Assume that 

(i) {(£,N,i,&(£,N,i))}i<i<M N is consistent for {v,C) and asymptotically normal 
for 0, A, W, cr, 7, {a N }), 

(ii) lini/v^oo Mn/Mn = I, lim/vr^ooaTv = 00 and limTv^oo o? N /M^ = (3, where 
£ G [0,oo] and (3 G [0,oo], 

(Hi) 1/$ G C, and v (£u(l/§)$ G N U {00}) = 0. 
Define 

LMV*)*J t<oc (23) 
£ = 00 . 

Then, the following holds true for the uniformly weighted sample {(£jv,ij 1)}i<j<m a 
defined by the deterministic-plus-residual sampling ( (|2()jl and (j21j) ) 

(i) If (3 < 1, then {(£n,i, ^)}i<i<M N ^ s asymptotically normal for (u, A, C, a, 
7) { a N}) where A is given by (j22|) and 

_ def „ 

j = /3u . 

(ii) If (3 > 1, then {(£jv,i) -Q}i<i<M * s as U m pt°t'i ca ^y normal for {y, A, C, a, 

~ 1/2 — ' 

7, {-Mjy }) where A is given by (1221) and 



- def 

7 = 1/ , 
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Because W[£, $] < 1, for any / G A, 



< inf v{{f-c) 2 } =Var„(/) 



showing that the variance of the residual plus deterministic sampling i s always 
lower than that of the multinomial sampling. These results extend (|Chopin . 
20041 . Theorem 2) that derive an expression of the variance of the residual sam- 
pling in a specific cas e. Note howeve r the assumption Theorem 15^ (hi) | is missing 
in the statement of ( ChopirJ . 120041 . Theorem 2). This assumption cannot be 
relaxed, as shown in Section iDl 



4 An application to state-space models 

In this section, we apply the results developed above to state-space models. 
State-space model has become a powerful tool for dynamic systems. In this 
model, an underlying state of interest changes over time and measurements 
are taken to enable inferences to be made about the state. The state process 
{Xk}k>i is a Markov chain on a general state space (X, £>(X)) with initial distri- 
bution x an d kernel Q. The observations {Y k } k >i are random variables taking 
value in a general state space (Y,£>(Y)) that are independent conditionally on 
the state sequence {X k } k >i; in addition, there exists a measure A on (Y,£>(Y)), 
and a transition density function x i— > g(x, y), referred to as the likelihood, such 
that P(Y" fc G A \ X k = x) = f A g(x,y)\(dy), for all A G Y. The kernel Q and 
the likelihood functions x \— > g(x, y) are assumed to be known. These quantities 
could be time-dependent. Such models have been used in a wide range of ap- 
plications, including quantitative finance, engineering and natural sciences and 
consequent ly have become o f increasing int erest to statisticians (see for instance 
( Liul . 2001 . chapter 3,4) and Kiinschl ( 2001 ) for an introduction to that field). 



In this paper, we are primarily concerned with the recursive estimation of 
the (joint) smoothing distribution, i.e. the conditional distribution of the state 

dcf 

sequence Xi :k = (Xi, . . . ,Xk) given the a-algebra generated by the observed 
process from time 1 to k, i.e. one is interested in estimating the conditional 
expectation 

<ht,k{Vi*> /) E [/(*i:fc) I Y i-k = W.k] , where / G B(X fe ) . (24) 

We shall consider the case in which the observations have an arbitrary but fixed 
value y\-k, and we drop them from the notations. We denote gk(x) = g(x,yk)- 
A straightforward application of the Bayes formula, shows that, for any / G 
B + (X k ), 

^ _ J ■■■ J ^ x ,fc-i(^i :fc _i)Q(x fc _i, dx k )g k (x k )f(x 1:k ) ^ 
/ • ■• / <f>x,k-i(dxv.k-i)Q(xk-ii dxk)gk(xk) 
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In practice, these computations can only be performed in closed-form for lin- 
ear Gaussian models and for finite state-space models. Many approximations 
schemes have been proposed in the literature to tackle this problem, but most 
of the known solutions either suffer from poor performance and / or instabil- 
ity (Extended Kalman Filter, Gaussian sum filter, etc.), or are prohibitively 
expensive to implement (grid-based solutions). In the Monte-Carlo framework, 
we approximate the posterior distribution <f> Xt k at each iteration k by means of 
a weighted sample {(^\, ^\)}i< k <M N , where the superscript k indicates the 
iteration index. Note that (j) X:k is defined on (X k+l , B(X k+1 )) and thus that the 
points £w ■ be long to X fc+1 (th ese are often referred to as path particle in the 



literature; see Del Morall (|20Q 



To apply the results presented in section it is first required to define a 
transition kernel L k _i satisfying @ with v = <j> Xtk _i, (S,B(S)) = (X k ,B(X k )), 
fi = <j> x , k and (S,B(S)) = (X fe+1 , B{X k+l )), i.e. for any / G B+(X fc + 1 ), 

, / n /•••/^ X ,fc-l( rfa: l = fc-l)- L fc-l( X l = fc-li^l:fc)/(^l:fc) / 9fi x 
^ X ' kU) ~ /-J^*-l(<tel:»-l)^-l(<fal:*-l I X fc + 1 ) 1 ' 

In the second step, we must choose a proposal kernel Rk~i satisfying 

Lk-i(xi-.k-l, •) < -Rfc-i(xi :fc _i, •) , for any G X fc . (27) 

There are many possible choices, which will be associated with different algo- 
rithms proposed in the literature. The first obvious choice consists in setting, 
for any / G B(X k ), 

L fc _i(xi; fc _i, /) = T fc _i(xi; fc _i, f) = J Q{x k -i, dx k )gk{x k )f{xi: k -i,x k ) . (28) 

Note that, by construction, the kernel T k _i(xi :k -\, ■) leaves the coordinates 
SEi;fc_ i. This corresponds to the so-called sequential importance sampling al- 
gorithm. The first obvious choice is that of setting R k -\ = Q k -i, where the 
kernel Qk-i is defined, for any / G B + (X fc ), 

Qk-i(xi-.k-i, f) = J Q{x k - 1 ,dx k )f{xi; k -i,x k ) . (29) 

With this particular choice, for any x\- k -\ G X k and x,\± G X fc , 

dT k _i(xi- k - 



dQ k -i(x 1:k 



■^-^(xi-.k) oc g k (x k ) . (30) 



Note that the incremental weight g k {x k ) does not depend on x\. k _i, that is, 
on the past path particle. The use of the prior kernel R k -x = Q k -i is popu- 
lar because sampling from the prior kernel Q k -i is often straightforward, and 
computing the incremental weight simply amounts to evaluating the conditional 
likelihood of the new observation given the current particle position. Often, 
significant gain can be expected by taking into account the new observation in 
the mutation kernel R k -\ (see for instance Liu and Chenl (|l99Sft . IPoTicet et al 
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(2003) and iDoucet et al. A possible choice is to set the instrumental 

kernel Rk—i{%i:k—ii') as the conditional distribution of the state X k given the 
previous state and the current observation, i.e. to set Rk-i = I^Ld where for 
any / G B+(X fc ) and x Xlk -\ G X fc_1 , 

/ Q(x k ^ 1 ,dx k )g k (x k )f(x 1:k ^ 1 ,x k ) 



•-k-i 



(«l:fc-l,/) 



/ Q(x k -x,dx k )g k (x k ) 



which is often termed the optimal kernel. The incremental importance weight is 
thus given by 



/ Hxx, k ) = / Q(x k -i,dx k )g k {x k ) 



(31) 



which depends on the last coordinate of the past value of the path particle xi :k _\ 
but not on the current value of the particle offspring x\- k . Most often, it is not 
easy to sample from T£_ l or to compute the importance weights T k _] (£ y 1 \ X). 
A pos sible solution consis ts in sampling from , by accept-reject (see lTanizakil 
(1992) and lKunschl^nn^ l: another solution is to approxim ate the optimal ker- 
nel by using more or less sophisticated t ricks (see for instance IShephard and Pitt 
( 19971 ) . IDoucet et all fcooch . iTanizakl (j200lh ). 

Ot her choices for L k _\ and R k -i are possible. For example, Gilks and Berzuinil 
rave introduced a variant of this algorithm, the resample-move algo- 
rithm, in which the whole path of the particles are mutated. This technique 
allows to combat the progressive impoverishment of the system of particles as 
the dynamic process evolves. The construction goes as follows. Let P k -\ be 
a kernel on (X fc ,£>(X fc )) such that (/> xk — i is an invariant distribution for P\ 



k-i, 



i.e. 4> X)k _\P k _i = 4> x>k _\. Such kernel can be constructed easily by using for 
example the Metropolis-Hastings construction (which automatically guarantees 
the detailed balance condition). Then, define the kernel L k _i, for any / G B(X k ) 
and x 1:k -i G X fc_1 , by 



L k -l(xi :k -i, f) 



Pk-i{xix k -\,dxv. k -i)Q{x k -i,dx k )g k {x k )f{xi, k ) 



Because P k -\ is invariant for (p Xjk _i, for any / G B(X k ), 

j " j <P X ,k~l( dx l:k-l)Pk-l(xi:k-l, dXl:jfe_l)<2(£fc-l, dx k )g k (x k )f(x 1:k ) = 

<P x ,k-i{dxi ± _i)Q{x k _i,dx k )g k (x k )f{xi. k ) 



(32) 



showing that (|26j) . With this definition of L k _i, the condition Q27JI is satisfied 
for example by the kernel R k -± given, for any / G B(X k ), 



Rk-l{x\:k-l,f) 



Pk-i{xi±-i,dxi, k -i)Q(x k -i,dx k )f(xi, k ) . 



Other possible choices and implementations issues are given in lGilks and Berzuinil 
I 200lh . 
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We proceed from the weighted sample {(£ 



(fc-i) , ffc-i) 



N,i 



N.i 



)}i<i<M N targeting 

i to {(£7^, ^jvi)}i<t<Mjv targeting 4> x ^ as follows. To keep the discussion 
simple, it is assumed that each particle gives birth to a single offspring. In the 
mutation step, we draw {Cn \}i<i<M N conditionally independently given l ' 
with distribution given, for any / 6 B + (X fc ) by 



E 



T 



(k-i) 



N 



where i = 1, . . . , M;y. Next we assign to the particle £ 
importance weight 

~(k) (fc-i)w 



(fe) 



(33) 
Mat, the 



Civi) Wlth Wib_i(Xi:fc_i,Xi:fc) = -75 7 r(^l:fc) 

(34) 

Instead of resampling at each iteration (which is the assumption upon which 
most of the asymptotic analysis have been carried out so far), we rejuvenate the 
par ticle system only w hen the importance weights are too skewed. As discussed 
in ( Kong et al. . 19941 . section 4), a sensible approach is to try to monitor the 
coefficient of variations of weight, defined by 



CV 



(*) 

A? 



def 1 



Ah 



M N t 



E 



/ M n oj 



-(*) 
N,i 



i 1 



(fc) 

N 



The coefficient of variation is minimal when the normalized importance weights 



-,(*) /?>(*) 



1, . . . , M^, are all equal to 1/Mjv, in which case CVjy = 0. The 



maximal value of CV^ is \/Mjv 



1, which corresponds to one of the normalized 
weights being one and all others being null. Therefore, the coefficient of variation 
is often interpreted as a measure of the number of ineffective particles (those 
that do not significantly contribute to the estimate). A related criterion with a 



(A;) 



simpler interpretation is the so-called effective sample size ESS^y 
defined as 

M N ( - "- 

E 

i=l 



ESS 



(fc) 

A 




1996) 



(35) 



which varies between 1 (all weights null but one) and Mn (equal weights). It is 
straightforward to check the relation 



ESS 



(*0 

N 



M N 



1 + 



CV 



2 ' 



The effective sample size may be understood as a proxy for the equivalent number 
of i.i.d. samples at time k. Some additional insights a nd heuristics about the co- 
efficient of variation are given bv lLin and Ch^\ jlflflfj ) . If the coefficient of varia- 
tion of the importance weights (or equivalently, if the ratio of the effective sample 
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size to the total number of particles, ESS^J /Mjv) crosses a threshold we rejuve- 

(k) 

nate the particle system. More precisely if at time k, CVjy > k (such time index 
are called dynamic check points in Liu et all ()2flf)ll )). we draw I^' 1 , ■ ■ ■ , 1^'^' 



conditionally independently given T K N ' = T K N 'V ^({^t'^V*^ 1 -^^)' with 
distribution 



( I?'* 



, ^n) = "Nj/toN . i = l,-..,M N ,j = l,...,M N (36) 

and we set 



CnI = ^nj^ and ^ = 1,* = !,...^^. (37) 



If CV^ < k, we simply copy the mutated path particles: (£jvi' ^JVi) = (^N i^Ni) 

i = 1, . . . ,M N . In both cases, we set J 7 ^ = fiffiv o"({(^, w^)}i<»<Mjv We 
consider here only multinomial resampling, but the deterministic plus residual 
sampling can be applied as well. 

Theorem 6. For any k > 0, let and Rk be transition kernels from (X k ,B(X k )) 
to (X +1 ,B(X )) satisfying (|2l)|) and (|27|). respectively. Assume that the equally 
weighted sample {(^\, l)}i<i<A/jv i- s consistent for {(j> x ,i, L 1 (X, x ,i)} and asymp- 

1/2 

totically normal for (4> x> i, Ai, Wi, <ti, </> x ,i, {-^v }) u '^ ere Ai and Wi are proper 
sets and define recursively (A&) and (W&) fry 

A fc d = {/ g L 2 (x fc , x , fc ) , Lfe-i(-, /) g A fe _x , wi_j 2 ) g W fc _x} , 



{/ G L 1 (x fc ,<A x , fc ) , i? fe _i(", Wf^l/I) G W fc _x} 



Assume in addition that for any k > 1, -Rfc(-,Wj?) G W^. T/ien /or any 
> 1, (Afe) and (W^) are proper sets and {{^\,^^\)}i<i<M N is consistent 
for {4> Xt k, L 1 (X, </> x ,fc)} and asymptotically normal for (<t> x ,k, A fc , W fc , 7 fc , 

1/2 

{Mjy }), where the functions o~k and the measure 7& are given by 

°l(f) = tkV^Jf) 

4-i( L k-i{f ~ x> fe(/)}) + 7fc-A-i f{W fc -i/ - fljb_i(-, Wfe-i/)} 5 
+ 



{<^ x , fc „iL fc _i(X*0} 2 



where Wk is defined in (|34j) and 



e fc = 1 {[0 x , fc _ 1 L fc _ 1 (X fc )]- 2 7fc _ lJ R fe _ 1 (W 2 _ 1 ) > 1 + k 2 } 

Proof. Recall that the algorithm proceeds as follows. The weighted sample 

{(ZnJ^^nI )h<i<M N are mutated into {(f^, ^N,i)h<i<M N as described by 
(|33|) and (see also section 13. lj) . The resulting particles are then resam- 
pled according to the multinomial resampling algorithm (as described in Section 
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13, 2 J) so as to obtain a family of equally weighted particles that we denote by 

/ j-(h— 1) (fc — 1)\ mutation / Z(k) ~ (fc) \ resampling / p(k) \ 

We then assign 

l(^iv,»' ) otherwise 
The proof now follows by induction. Assume that for some fe > 1, the 
weighted sample {(C^" 1 > w w,i" )}l<i<M w is consistent for {0 x ,fe-i, L 1 (X, x ,fe-i)} 

1/2 

and asymptotically normal for {<j> x ,k-i, &k-i, W fc _i, <7 fe _i, 7 fc _i, {M^ }). By 
Theorem H] and 12 (fj^, ^Jl^Afjv is consistent for {^ Xi fc, L 1 (X, 4>x,k)} and 

~ ~ 1/2 ~ 

asymptotically normal for (0 Xi fc, Afc, W fc , cr&, 7^, W» where A fc , W fc , Cfc, 7fc> 
are defined from Afc , Wfc , Cfc , 7^, using Theorem [2j And by Theorem El and El 

(^iV j) l)i<i<Mjv is consistent for {</> x ,fc, L 1 (X, (f> Xt k)} and asymptotically normal 

- — 1/2 
I" 1 ' (<l>x,k, Afc, Wfc, o-fc, 7fc, {M^}) where A fc , W fc , cifc, 7fc, are defined from A fc , 

Wfc, crfc, 7fc, using Theorem H The asymptotic normality of {l^^ t ^i<i<M N 
and (gfil, l)i<i<M N , combined with: 

m n 

= Mw Eh| -^^(IH and £ fc = l{7fc(l)-l>K 2 } , 

i=l / 

complete the proof. 

□ 



A Weak Limits Theorems for Triangular Array 

This section summarizes various limit theorems for triangular arrays of depen- 
dent random variables To keep the technical assumptions in our main theorems 
minimal, we derive these limit theorems unde r assumptions that are weaker than 
the one typically used in the literature (see lHall and Hevdel rtl980ft l. Through 



our general results can be obtained by weakening the assumptions an d adapting 
the pr oofs f or triangular arr ay of dependent random variables given in lDvoretzkv 



dl 972[ l and iMcLeishl Jl 974ft . we prefer to develop them here independently. We 



hope that the greater accessibility of the proofs will compensate for this sacrifice 
of brevity. Let (f2, J 7 , P) be a probability space, let A be a random variable and 

let Q be a a sub-cr field of J-. Define A + = f max(A, 0) and A~ = f — min(A, 0). 



Following llShirvaevl . U Section II. 7), if 



min(E [ A+ I G] , E [X~ \ Q] ) < 00 , P-a.s. 
(a version of) the conditional expectation of X given Q is defined by 
E [ A I = E [ A + J Q] -E[X-\G] , 



18 



where, on the P-null-set of sample points for which E [X + \ Q] = E [X~ \ Q] = 
oo, the difference E [X + \ Q] — E [X~ \ Q\ is given an arbitrary value, for in- 
stance, zero. In particular, if E \ \X\ \ Q\ < oo P-a.s. then E [X + \ Q] < oo and 
E [X~ | Q] < oo P-a.s. and we may always define the conditional expectation in 
this context. 

Let {Mat} n>o be a sequence of positive integers satisfying liniAr^oo Mn = oo. 
Without loss of generality, we will assume that {Mn}n>o is non decreasing. 
Let {UN,i\i<i<M N be a triangular array of random variables on P). Let 

{^N,i}o<i<M N be a triangular array of of sub-sigma-fields of T such that for 
each N and each i = 1, . . . , Mjy, Un,i is .T-Tv^-measurable and J^N.i-i ^ ^N,i- 

Proposition 7. Assume that E [|t/jvj| | J-nj-i] < oo P-a.s. for any N and any 
j = 1, . . . , Mn, and 



/ M N \ 
supP ( ^E[|J7jvj| | J-jvj-i] > Aj -> 

^E[|E/} V j|1{|E/- J vj| > € }\r N j-i]^+0 

3=1 



as A — > oo 



/or any positive e 



(38) 
(39) 



. 



Pro of. Assume first t hat for each ./V and each i = 1, . . . , Mn, Un,i > 0, P-a.s.. 
By (|Dvoretzkvl . Il97l Lemma 3.5), we have that for any constants e and rj > 0, 



max Un i > e 

l<i<Mjv 



< 7? + P 



'Afj\ 



Y,P(U N}i >e|^,i_i) > r? 



i=l 



From the conditional version of the Chebyshev identity, 



max Un i > e 

l<i<M N 



< r/ + P 



^2E[U N ,il{U Nti > e} | JP^i-i] > 



i=l 



Let e and A > and define Uj^i by 



(40) 



U N ,i = U N4 l{U N ,i < e} 1 \J2 E 1^.3 I ^j-i] < A 

3=1 



For any 5 > 0, 

P | max 

l<i<Mjv 



> 25 



< P I max 

l<i<M N 



+ P max 

l<i<M N 



3=1 

i i 



3=1 



3=1 



> (5 



^2 Un >3 ~ Un >3 ~^2 E i U N,j ~ U N,j I FNj- 



3=1 



3=1 



> 5 
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The second term in the RHS is bounded by 



P [ max 

l<i<M N 



\ ( MN \ 

U N ,i > ej + P I J2 E i U N,j I ^j-i] > A J 



P I J2 E [ u Njl{U Nd > e} | ^j-i] > 5 

Eqs. (J39j) and (j40j) imply that the first and last terms in the last expression 
converge to zero for any e > and (|38|) implies that the second term may be 
arbitrarily small by choosing for A sufficiently large. Now, by the Doob maximal 
inequality, 



P I max 

l<i<A/jv 



E Un '3 ~ E [ U N,j | FNj-l] 
3=1 



> 5 



M N 



<*" 2 E E (^- E [^I^-J)' 

3=1 



This last term does not exceed 



M N M N 

5- 2 J2 E (U 2 Nj ) < 6- 2 eJ2E[U NJ ] < <T 2 eE 

i=l j=l 



M N 

^2 E [ U N,j \^N,j-l] 
3=1 



< <T 2 eA . 



Since e is arbitrary, the proof follows for Unj > 0, P-a.s., for each N and j = 
1, . . . ,M]y. The proof extends to an arbitrary triangular array {UN,j}i<i<M N 
by applying the preceding result to {Uxj}i<i<M N and {U^j}x<j<M N - □ 



Lemma 8. Assume that for all N and i = 1, . . . , M^, ; = f E 
oo P-a.s. and E [Z7jv,i I -^/v^-i] = 0, and /or all e > 0, 

M N 



U 2 

N.i 



J~ N, 



i-1 



< 



^[UkMpNM > e} I F Nfi ] 



(41) 



i=l 

Then, if any of the two following conditions holds 

(V eSl ^ = i, 

fn) cJ7v,i is Fna -measurable and {Ej=a ^jv j} N>o is tight, 
then for any real u, 

M N 





/ M N \ 






E 


exp ht C7 ]v,j 1 




- exp | 




. V i =1 / 







V/2)£ 

3=1 



2 



. (42) 
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Proof. Write the following decomposition (with the convention X] J=a = if 
a > b): 



e 1U ^j=l U N,J_ e 2 2^3=1 °AT,j 



E 

i=l 



g 2 "AT,! ] g 2 2-ij=l+l"N,j 



Moreover, if (i) or (ii) holds, X^=i an d Sj=T+i °Ar,,j are -^./-l-measurable 
random variables. To see this, write if (i) holds, YljJi-\-i a N j = ^ ~ Yl l j=i a Nj 
and if (ii) holds, note that Ylj=^+i a Nj * s a •^ r /v,o _measura ble random variable 
and .F/v,o C Fni-i ■ This implies that 



M M 9 



E 



M N \ / Mjv 

m^C/jvj -exp -(u 2 /2) ^o-^j | ^v.i 
i=i / V i=i 

Mat 

<^E[|E[exp(inC/ 7V ,/)|^-i]-exp(-nV^/2)||j- J v,o] • (43) 



For any e > 0, 



.2 2 



E [exp (iuU Nj i) | Fn,i-i] - 1 - ^^jv,! 



< ^H 3 E[|C/jv,H 3 1{|^| <e}|^,/-i] +w 2 E[C/^l{|?7 7 v,i| > e} | Fn,i-i) 

< \e\u\ 3 <>N,i + u 2 E [E^lfltfy,! > e} | J^.i-i] , 



and thus, 



E 



E 



E 



exp (iuU N j) - 1 - -u 2 a 2 N l 

M N 



N,l-1 



N,0 



M N M N 

<^M z Y, a h + u2 Y.^[ u h 1 {\ u m\>^\^NA ■ (44) 
i=i i=i 

Using either (i) or (ii) and since e > is arbitrary, it follows from (|41[) that the 
RHS tends in probability to as N — > oo. Using again a Taylor inequality, 



E 



M N 

E 



E 



exp (-u a N) i/2) - 1 - -u a N>l 



•Fn. 



i-i 



M N 



l=i 



max a at, 

1<1<M N ' 



/M N 



\Y,°ki) > ( 45 ) 



since under (i) or (ii), J2i=i a % i ls ^jv^-measurable. Because, for any e > 0, 



E 



max o" 

l<j<M N 



N,j 



M N 



<e 2 + J2v[U 2 N,i 1 {\UN, j \>e}\F N , ] 

3=1 
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it follows from (|41j) and assumptions (i) or (ii) that the RHS of 1)45 [) tends in 
probability to as N — > oo. Therefore the LHS of ()43|) tends in probability to 
because the sum on the RHS of this display is bounded by (|4~4"|) and (|4*5|) . □ 



Proposition 9. Assume that o~ N • = f E 
for all N and i = 1, . . . , Mn and 

M N 



/7 2 



i-1 



< oo P-a.s. , E [UN,i | Fn. 



t-l 



2 P 2 
' 'i — » ^ 



/or some a 2 > 



J^E [tf^lflEfyil > e} | F N ,i-i] ^0 /or any e > . 

i=l 

Then for any real u, 

I Mjv N 
E exp in ^ U N ,i ] T Nfi 
V i=l / 



exp(-o- 2 n 2 /2) . 



(46) 
(47) 

(48) 



Proof. Without loss of generality, we assume that a 2 = 1. Define the stopping 
time Tjv by 

1 

l<k<M N :J2 <r%j < 1 f > 

3=1 



def 

Tjv = max < 



with the convention max0 = 0. Put C/jvfc = ^TVfc for A; < Tjv, £7jvfc = for 

1/2 . 



t~n < k < Mn and Un m 



N,j 



Y N , where {Y N } are yV(0, 1) 



independent and independent of Tn,m n • By construction, 



E[U 



TJV 

r,Mjv+l | ^Mjv] = 1 - ^2 a N,j <00 , P 

3=1 



-a.s. 



and E [U n ,m n +i \ Fn,m n ] = 0. The triangular array {U N ,k}i<k<M N +i obviously 
satisfies the conditions of Lemma |S] for conditional means and variances. By 



construction, 



lis 



£E [U NJ l{\U Nij \ > e} | F Nfi ] = E 

3=1 



^[UnjIUUn^I > e}\F Nd -i] 

3=1 



(49) 



Since EJ=i E U Nj l{\U N ,j\ > e} Tnj-i < 1, then g7|) shows that the RHS 
of (|49|) converges in probability to as N — > oo. On the other hand, if tn < Mn, 



For any e > 0, 



< i -J2°h ^ a N,m+i ^ i^M^kj ■ 

3=1 



Mjv 



N 3=1 



22 



Since e > is arbitrary, it follows from ijlTjl that 



3=1 

and since 1 — Xw=i a Nj — 1> this implies that E 
fore, {f/n./cli^fc^A/jv+i satisfies (@TJ). Put 



r? 2 



(50) 



0. There- 



Mjv Mjv + 1 Mjv 

E ^.i = E U N >3 ~ Un,m n +i + E U N j . 

3=1 3=1 j=TN+l 



(51) 



The first term on the RHS is asymptotically J\ (0, 1) and Un,m n +i — ► 0. It 
remains to prove that S/f^+i Unj — ► 0. First note that 



tat 



E = E ^ ~ 1 + 1 ~ E ^ 



(52) 



j=tjv+i 



i=i 



For any A > 0, 

/ M N 

E £ ETivjl 
\i=T"iv+i 



E ^< A 

j=TjV + l 



E E E <^ A 



l j=TN + l 



i=TN + l 



The term between brackets converges to in probability by (|52jl and its abso- 
lute value is bounded by A. Thus, by dominated convergence, this expectation 
converges to 0. Thus, 



Ma 



E u nA { E a h< x 



3=T~N + 1 



i=TN + l 



Moreover, 



M N 



p E %m E 4> A k° 



\j=TN + l 



i=TN + l 



M N 



<p Mje {!,..., M^v}, E °h> x = p E (J h> x 



i=T N +l 



,i=T N +l 



which converges to by (|52|). The proof is completed. 



□ 



We will use th e following technical Lemma which is a conditional version of 
( Dvoretzkv . 19721 . Lemma 3.3). 

Lemma 10. Let Q a a- field and X a random variable such that E [X 2 | £/] < oo. 

Then, for any e > 0, 

4E [|X| 2 1{|X| > e} | g] > E [\X - E [X \ Q] \ 2 1{\X - E [X \ Q\ \ > 2e} | Q\ , 
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Proof. Let Y = X - E [X \ Q\. We have E [Y \ Q\ = 0. It is equivalent to show 
that for any (/-measurable random variable c, 

E[Y 2 1{|F| >2e}\G] < 4E [|y + c| 2 l{|y + c| > e} | Q] 

On the set {|c| < e}, 

E[y 2 i{|y| >2e}\g] < 2E [((y + c ) 2 + c 2 )i{|y + c| > e} \g] 

< 2(l + c 2 /e 2 )E [(Y + c) 2 l{\Y + c\ >e}\g] 

< 4E [(Y + c) 2 l{\Y + c\ >e}\G] . 

Moreover, on the set {|c| > e}, using that E [cY \ g] = cE [Y \ g] = 0. 

E[Y 2 l{\Y\>2e}\g] < E[Y 2 + c 2 -e 2 \g] 

< E [ {Y + c ) 2 -e 2 \g] 

< E [(Y + c) 2 l{|y + c| >e}\g] . 



The proof is completed. 

Corollary 11. Assume that for each N and i = 1, . . . , M^, E 

oo and 



TJ 2 



NA-l 



□ 

< 



M N 



^{EfC/^l^jv.i-i] -(Eff/^l-^v^i]) 2 } ^o" 2 /or some a 2 > , (53) 

M N 



i=l 

Then, for any real u, 

M N 



i-1 







for any e > 



E 



exp iu^2{U N)i - E [U Nyi | ^i-i]} 



N,0 



If in addition Fna C ^v'.i /or N' > N and i < Mn, then 



(54) 



exp(-(w 2 /2)cr 2 ) . (55) 



M N 



Y^i U N,i ~ E [U Nfi | f N ,i-i]} AA(0, a 2 ) (romng) . 



i=i 



(56) 



Proof. Set L^v.i = Cjv,i — E [C/jv,i | ^jv,i-i]- By construction E [f7jv,j | ^TV.i-i] = 

- (EfC/^il^i-i]) 2 , © is 



and because E 



r? 2 



NA-l 



E 



r/ 2 



fulfilled. The proof of (|47|1 follows from Lemma fTUl It remains to show (|56|). 
Let = Vi°^jv,Mjv the cr-field generated by U^°^Fn,m n - For any £" G ^oo and 
any e > 0, there exists an m such that i? G T m ,M m and P(EAE') < e (where A 
denotes the symmetric difference). Denotes Vna = f^JV.i — E [£/jv,j | ^j\r,i-i]- 



E 



Ma 



exp in^% 1(E) 



-E 



Ah 



exp iu^yjv,, l(E') 



i=l 



< P(EAE') < e 
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Moreover, applying ([5H]l to the triangular array {U n ,M m +i}i<i<M N -M m (with N 
sufficiently large) associated to the cr-field {Jjv,M m +i}o<i<M JV -A/ m , 



E 



exp m V N,i HE) 



i=m+l 



exp(-«V72)P(^) 



and maxi^j^Mjv \ Vna\ — ► 0. Thus, 



E 



A I 



'jv 



exp mJ^VNj HE) 



i=l 



exp(-uV/2)P(£) 



This implies that for any E' G 



E 



exp iu^Viv.i HE') 



i=i 



exp(-uV/2)P(£') 



Now, let E G T % 



E 



exp m^Fjv.i 



i=l 



E 



Ma 



exp I iu^Viv,i JE[ 1(^)1 ^oo] 

exp(-uV/2) E(E [l(-E) | .Foe]) = exp(-uV/2) P(£). 
The proof is completed. □ 

B Proof of Theorems [T] and [2] 

Proof of the Theorem^ We set T Nfi = a ({ (£ N>i , Wjv.t)}^^^) and for j = 
1, . . . , Mat, ^tvj = ^/v^Vcr ^{£jv,fc}i<fc<j) • Checking that C is proper is straight- 
forward, so we turn to the consistency. We show first that for any / G C, 



ttjV^jV 



(57) 



where fjvj and (Djyj are defined in (fTT|) and (fT2|) . respectively. Because the 
mutation step is unbiased (see (113(1 ). 



/(6vj) 



j'-i 



i=l 



Because the weighted sample {(^j\r,i) ^JV,i)}i<j<Mjv is consistent for (i/, C) and 
for / G C, the function L(-, /) G C, 



Mjv 



1 X ^ P 

fi iv 2^,u N>i L(£ Nji , f) — > ^(/) 



2=1 
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it suffices to show that 

Mjv 

(a N n N )~ l ^2 \fi N) jf(£ N>j ) - E 0JN,jf(iN,j) Fnj-i] } . (58) 
i=i 

Put Unj = (c>!iv^iv) _1 WAr,j/(^Afj) for j = 1,...,Mjv and appeal to Proposi- 
tion [7| Just as above, 

M N M N 

J2E[\u Ntj \\r N j-i] = n- 1 J2"N, l L(tN,i,\f\) -^^(l/l) , 

j=l i=l 

showing that the sequence I E [ I Ujy j I I Fn i-i \ \ is tight (Proposition!?}- 

l J ~~ ) N>0 

Eq.®). For any e > 0, put A N = Y.f=i E > e} I -^vj-i]- We 

need to show that An — ► (Proposition I71-Ea.([55j)). For any positive C, 
£ E 3, > C}) < R(t,W\f\) = L(£,|/|). Because the func- 

tion L(-,|/|) belongs to the proper set C, the function R(-, W|/|1{W|/| > C}) 
belongs to C. Hence for all C, e > 0, 



AmI < (aM^Ar) 1 max ujMi<e/C> 
{ l<i<M N ' J 



M N 
i=l 



ON 



<*N52R(tN,i,w\f\i{w\f \ >c}) 



k=l 



JUuR(W\f\l{W\f\>C}) . 

By dominated convergence, the RHS can be made arbitrarily small by letting 

— 1 p 
C — > oo. Combining with Q N m&xi<i<M N u)N,i — ► 0, this shows that A n tends 

to zero in probability, showing (|39|) . Thus Proposition UJ applies and (jHTjl holds. 

Under the stated assumptions, the function L(-,S) belongs to C, implying that 

the constant function g = 1 satisfies (|57|l: therefore, (ckatSItv)" 1 X^jii """^ 
z^L(H). Combined with (|57|1 this shows that for any / G C, 

To complete the proof of the consistency, it remains to prove that 

~_i P 
Q N max u>n j — ► . 

l<j<M N 

i ' — P 

Because (oatOtv) - Ojv — ► ^(S), it is actually sufficient to show that 

— i P 

(oAr^iv) max ujnj — ► . 
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For any C > 0, 
(ajv^iv)" 1 



max u Nd lr w( z )<c} < C {a N Q, N ) 1 

l<j<M N x ^"'i'- 1 



max uj 7v , 

l<i<M N 



(ajv^jv) 1 max lun jl 

1<3<M N 



{W(€iv J )>C} ^ (ow^jv) E^" 1 {W(«W,0>O} 

J'=l 

P 



z/L({W > C}) 



The term in the RHS of the last equation goes to zero as C — > oo which concludes 
the proof. □ 

Proof of the Theorem First we note that by definition a is necessarily at least 
1. Checking that A and W is proper is straightforward. Pick / E A and as- 
sume, without loss of generality, that fi(f) = 0. Write O^ 1 &N,if{£,N,i) — 
(o.nQn/&n) {An + B N ) , with 



M N 



N,j-1 



A N = (on^n)' 1 ^E [a)jvj/(6v,, 

M N 

B N = (un^n)" 1 ^ |^JVj/(6v,j) - E U> N ,jf(£N x 



M N 

^A/ 1 X] U N,iL(^N,i, f) , 
i=l 



Because on^n/^n converges to l/uL(3) in probability (see in the proof of 
Theorem [J), the conclusion of the theorem follows from Slutsky's theorem if we 
prove that m]I 2 {A n + B n ) converges weakly to N(0, a 2 (Lf) + a _ V(/)) where 

v\f)=jR{[Wf-R(-,Wf)] 2 }, (59) 
with W given in IjlOJI . The function L(-, /) belongs to A and vL(f) = n(f) vL{3) = 

1/2 

0. Because {(Cn,i, &N,i)}i<i<M N is asymptotically normal for (is, A, W, a, 7, {-M^ }), 
M^J 2 An — — > N(0, o~ 2 (Lf)). Next we prove that for any real u, 



E 



~ 1/9 
exp(mMjy Bn] 



Nfi 



exp(-(n 2 /2)r ? 2 (/)) . 



where r] 2 {f) is defined in (|59|). For that purpose we use corollary 1111 and we 
thus need to check l[53j ) -(|54j ) with 

U N ,j = f (on^n)' 1 MlJ 2 u;N,jf(iN,j) , j = !,■■■, Mn ■ 

Under the stated assumptions, for / E A, the function R(-,W 2 f 2 ) belongs to 
W. By the Jensen inequality, for any £ E 5, {L{£, f)} 2 = {R(£,Wf)} 2 < 
R(t,,W 2 f 2 ). Because the W is proper and the function R(-,W 2 f 2 ) G W, the 
relation {L(-, f)} 2 < R(-, W 2 f 2 ) implies that the function {L(-, f)} 2 also belongs 
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to W. Because {(£n i, i)}i<i<M N is asymptotically normal for (u, A, W, a, 7, 

i 1 / 



{M]l 2 }), © implies 



M N M N 

^ E [U 2 N>j I Fsj-i] =^Y. <MtN* W 2 f) 7 R(W 2 f) , 

j=l U N i=1 

M N M N 

3=1 U * i=l J 

These displays imply that (|53j) holds. It remains to check (|54|) . We have, for all 
C, e>0, 

'm n 



^2 E [ U N,j 1 {\U N , J \>e} FN,j 



[ M N max!<i<M N < ( e \ 2 \ < 



J =1 

M N 



^^h R (^w 2 fi{\wf\>c})^ lR (w 2 fi{\wf\>c}) 

N i=i 

which converges to as C goes to infinity. Combining with 

(Mn) 1 / 2 ^- 1 max W7v,i^0 

l<i<Mjv 

yields 

M N 

[Ujr d l{\U NJ \ > e} I J-atj-i] , 

5=1 

and this is condition (|54|) . Thus corollary ^2 applies. It follows that 

N(0,diag{<7 2 (L/),77 2 (/)}) , 



m]I 2 a n \ ^ 

M l J 2 B N I 



where t] 2 {f) is given in (|59jl. The proof of condition (0J) in the definition of an 

1/2 

asymptotically normal sample is now concluded upon writing MJ (An + Bn) = 

M% 2 A N + aJf 1/2 Mll 2 B N . 

-~ p 

Consider now Recalling that f2Ar/(a7v^7v) — ► vL(5), it is sufficient to 

show that for / G W, 

(^EWW ^ «" , (60) 

where /i = W 2 f. Define t/jvj = (aN^N)~ 2 MN^% jfiCNj)- Under the stated 
assumptions, for any / £ W, the functions -R(-, and R(-, h) belong to W. Be- 



28 



1/2 

cause {(CN,i,^N,i)}i<i<M N is asymptotically normal for (y, A, W, a, 7, {M^ }), 

Mjv Mat 

^En^ii^v,^] = -^^^(e^N) , (6i) 

j=l N i = i 

M N M N 

V E [U NJ I Jwj-i] = -^V^fo, *0 a _1 7fl(/i) • (62) 
We appeal to Proposition [7| Eq. (ff)T|) shows that the tightness condition (|38|). 



For e > 0, set A at = Ejl^ E l^jl 1 !!^!^} 



We need to show 



that A N 0. For any positive C and £ € S, (£, |/i|l{|/i| > C}) < 
Because the function |/i|) belongs to W and the set W is proper, the function 
R(; \h\l{\h\ > C}) belongs to C. Hence for all C, e > 0, 

^ivl< max —2 < — > 

^ % (a N u N y CJ 
/If a/jv 

<^^h R (^,\h\l{\h\>C})^ 7 R(\h\l{\h\>C}) . 



By the dominated convergence theorem, the RHS can be made arbitrarily small 

by letting C — ► 00. Combining with Mjv(ayv^Ar) -1 maxi<i<M N i — > 0, this 
shows that An tends to zero in probability, showing (|39|) . Thus Proposition[3ap- 
plies and condition (J5J) is proved. To complete the proof, it remains to prove (jBJ. 

Combining with £In / '{oln^-n) ~^ vL(a) (see proof of Theorem^) and Mn = 

onMn, it is actually sufficient to show that [aN^%) 1 Mjv max i<j<Mjv ^JV j ""^ 
0. For any C > 0, 

max^.^^l^- maxi<i< Mjv wjy, P 
Mat < G Mat 7?T^2 ' u ■ 

Applying (|SU|l with / = 1, it holds that 

M max i<j<MA f ' :) ^ 1 {W(^, J -)>C} ffi_ 2 

" c^tv) 2 " «iv(^7v) 2 2^,jV(^.)>C} 

j=i 

-^7#(^ 2 1{W > C}) . 

The term on the RHS goes to zero as C — > 00. This proves condition @ and 
concludes the proof. □ 

C Proof of the Theorems [31 and |4] 

Proof of the Theorem^ As above, we set Fn,o = f ({(£iv,j ; a, iV,i)}i<t<Mjv) an< ^ 
for j = 1, ... ,M N , F Njj = F N ,a V a ({iN,k}l<k<j ) ■ Pick / in C. Since C is 
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proper, |/|1{|/| > C} G C for any C > 0. Because {(£jv,i, wjv,i)}i<i<M w is 
consistent for (r/, C), and C is a proper set of functions, 



M N 



i=l 



L {|/ttw,i)|>C} 



N,i-1 



n Nj2 U N,i\f(^N,i)\l{\f^ N>i )\>C} ► v (\f\l{\f\>C}) > (63) 



i=l 



We now check ^ -({35] ) of Proposition For any i = 1,...,M N , U N;i 
M^f{i N ,i) • Taking C = in 



def 



^E[|C/^||^ v ,i-i]=M^ 1 ^E[|/(| 



i=l 



i-1 



KI/l)<oo 



i=l 



whence the sequence {X)i=i E [|t^jv,j| I •^jv,i-i]}.iV>o is tight. Next, for any posi- 
tive e and C we have for sufficiently large N, 



i=i 



»l A {|C/jv,il>e} 



i-1 



1 M N 

^g E [l/««„)H 



{|/K£v,i)>eJ*w} 



JV,j-l 



Mat 



< M 



N 



i=l 



N,i)\l 



{\f\(iN,i)>c} 



i-1 



^(i/i^i/i^o}) 



By dominated convergence the RHS of this display tends to zero as C — > oo. 
Thus the LHS of the display converges to zero in probability, showing . □ 

Proo/ o/ toe T/ieoremgJ Pick / G A and write M^ 1 ^fii /(£i\r,i)-K/) = ^tv + 
£>jv with 

Aat = n^ 1 ^wj\r,i{/(6v,t) - , 
1=1 

P?jv = M^ 1 £ j/fc) - E [/(|iv,i) I 

i=l 



We first prove that 
E 



exp ( in M^/ 2 -Bjy 



7V,0 



exp(-( U 2 /2)Var 1 ,(/)) . (64) 



def 



We will appeal to corollary ^2 and hence need to check (|53 |) -(|54 |l with Un, 

~ — 1/2 

M N f(^N,i)- First, because {(£,N,i,^N,i)}i<i<M N is consistent for (v,C) and 
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for / G A, f 2 G C, 



^{E [C/^. | F Njj _ x ] - (E [C^j | ^-i]) 2 } 
3=1 



JV,5 



K/ 2 )-M/)} 2 =Var I/ (/), 



i=i 



i=i 



showing Q53|) . Pick e > 0. For any positive constant C, 



M N 

^2 E [ U N,j 1 {\U N , J \>e} 
3=1 



< M^X^E [/ 2 fe)l{|/(6v,;)| > C} 



i=l 



Afjv 



i=l 

where the inequality holds for sufficiently large N. Since f 2 belongs to the 
proper set C C L 1 (H,i/), we have / 2 1{|/| > C} G C. This implies that the 
RHS of the above display converges in probability to u(f 2 l{\f\ > C}). Because 
f 2 G C C L 1 (H, v), v(f 2 l{\f \ > C}) tends to zero as C -> oo, so that fiQ is 
satisfied. 

Combining fiQ with a N A N N (0,a 2 (/)), we find that for any real num- 
bers u and v, 



E 



~ 1/2 

exp(i(uM 7V Bn + vcinAn) 



E 



E 



exp (iuMl! 2 Bn ) 



exp^va^A 



N, 



exp {-(u 2 /2) Var„(/)} exp {-(^ 2 /2)a 2 (/})} 



Thus the bivariate characteristic function converges to the characteristic function 
of a bivariate normal, implying that 



A? 



Oat A 

mII 2 b n 



V 



N(0,diag[a 2 (/),Var M (/)]) 



~ 1/2 

Put 6at = a at if a < 1 and 6 at = MJ if a > 1. The proof follows from 
b N (A N + B N ) = (b N ar N x )a N A N + {b N M N l/2 )M l J 2 B N . 

The condition ® and © are obviously fulfilled using that the weighted sample 
{(6v,i; l )}\<i<M N is consistent for (v, C). □ 



D Proof of Theorem 03 

Proof of Theorem^ To apply Corollary ^2 we just have to check and (|54|) 
where U N ,i = a N M^ 1 /(far,*) and {.Fjv.i} defined by .Fjv.o = cr{(^,i)l<i<Mjv} 
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and for all 1 < k < M N , ^N,k = Fn,o V cr{(iN,i)i<i<k}- Noting that £jvj is 
measurable for j = 1, . . . , Mn, we have 

Mn 

A N = ^{EfC/^j^i-i] -(EfC/^il^v^i]) 2 } 

i=l 
M N 

= ^ {E [U% ti j ^i-i] - (E [U Nti I ^-i]) 2 } 



i=M N +l 
,2 



a% M N - M N 



M N /M n 
i=l 



M N M N 

where the weights uiN,i are given in 1)21(1 . Note that 



A'., 



(65) 



\i=l 



M N -M N =i Egj [Mn^un^ 
M N M N 



Q N uiN,i- M N [M N Q N uj Ni \ . 
lun i = ~ — : 57 ~ : and 

By applying Lemma IT2l 

M N . =1 v ' 

— — — > o(f) 

m n 1 - m- 1 L^n^^iJ 

It remains to check © and ©. By Theorem 13 the weighted sample {(£jv,i> 1)} 
is consistent for (z/, C), which implies, 



a% 1 



A,iJ 



and thus © is satisfied. © is trivially satisfied. 

Lemma 12. Under the assumptions of Proposition^ for any f £ C, 



□ 



Afjv 



,L Mi/g)gj \ 
1/(1/$)$ / 



Proof. For any if > 1, denotes £>/<- = U^Lob ~~ + V-^l- Because the 

weighted sample $(£A,j)}i<i<iWjv is consistent for (z/, C) and Mn^I n uj^,i 

MnQ n u)N } i, we have for any / 6 C 

A/^ 1 [Mi^n^w^iJ /fe)i {Mi/*)w*,i e (a, 00) u ([0, k] n 



< 



i=l 



^ FT E ^/fe) 1 {MV*)*(6v,i) e (if, 00) u ([0, K] n B K )} 



/(Oi {MV*)$(£) e (if, 00) u ([0, A'] n B K )} v(dO 
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The RHS of the previous display can be made arbitrarily small by taking K 
sufficiently because f G {00} UN} v{dk) = 0. For any K > 
1, there exists r/ > such that for any a,b £M, 

l{|a -&/(V$)l [0,K] \B K }(La6J - [M 1 /*)**]) =0 

Combining the previous equality with 



M N 
M N 



' M N 



1 



i=l 



Hi 



NA 



and 



m^ 1 ^ LMV*)*(6r,i)J {MV*)*(0 e [o,k]\b k } 
LMi/$)*(0J 



i=l 



yields 



A/jv 



)J {^(i/$)$(o g [0, x] \ *?*} «/(de) 



i=l 



The proof follows by letting K — > 00. 



□ 



The condition 1/ {^z/(l/$)$ G N U {00}} = in Proposition [5] and Lemma IT2*1 
is crucial. Assume that {S,N,i}i<i<N is an i.i.d. /i-distributed sample where \x is 
the distribution on the set {1/2,2} given by: Mi 1 / 2 }) = 2 / 3 and M{ 2 }) = V 3 - 
Let i/ be the distribution on {1/2, 2} given by: u({l/2}) = 1/3 and u({2}) =2/3. 
The weighted sample CN,i)}i<i<N (i-e. where we have set = £) is a 



consistent sample for 1/: for any function / G B({l/2, 2}) 
E, |/(1/2)| < 00 and |/(2)| < 00}, 



def 



{/: {1/2,2} 



S»=l &V,i/(£iV,i 



<Ei=l 



p (l/2)/(l/2)/i(l/2)+2/(2)/x(2) 



(l/2)/(l/2) + 1/3/(2) = u(f) 



(1/2)^(1/2) +2M2) 

In this example, i = 1 and obviously i/(l/<l>) = 1. Moreover, 

G {00} UN} = ^{{1/2,2} nN} = v{{2}) =2/3^0. 

We will show that the convergence in Lemma IT2~I fails. More precisely, setting 
/(£) = £, we will show that 



1 



Ah 



M N * 



E 



i=l 



UNA 



n 



N 



1 

~Mn 



M N 

E 

8=1 



>:j / u.v J j 



4/3-(2Z)/3 
(66) 
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where Z is a Bernoulli variable with parameter 1/2. This would imply that 



N 



f(%N,i) does not converge in probability to a constant. 



1V1 N l^i=l 

The LLN and CLT for i.i.d. random variables imply that 

M N 



M~ 1 n N = M~ l Y J i 



p 1 

AT.i ► 1 



i=l 



^M^ 1 < 1} 

i{M N n^ > i} 



)l/2( 



1) > 0}" 




z 


i) < o}_ 




1 - z 



where Z is a Bernoulli random variable with parameter 1/2. Since cjjv,j 
= Ui G {1/2, 2} and /(£) = £, 



1 M^v 3 

2 r2jv 2 J Mjs; 



, M N 



i=l 



1 Mjy 3 

2 < TT^ < 2 1 Mat 



2 7v 



M N 



M N 



M N 
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M N 



n 



N 



2M N 

n 



N 



/(&V,i) 

l{SN,i = 2} 
2} 



- n N 2j m n 



2} 



i=l 



(2Z)/3 + 4(1 - Z)/3 = 4/3 - (2Z)/3 
The proof of (|66ft is concluded by noting that 1- 
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